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Abstract. We propose a novel molecular computing scheme for statistical inference. We focus 
on the much-studied statistical inference problem of computing maximum likelihood estima¬ 
tors for log-linear models. Our scheme takes log-linear models to reaction systems, and the 
observed data to initial conditions, so that the corresponding equilibrium of each reaction 
system encodes the corresponding maximum likelihood estimator. The main idea is to exploit 
the coincidence between thermodynamic entropy and statistical entropy. We map a Maximum 
Entropy characterization of the maximum likelihood estimator onto a Maximum Entropy 
characterization of the equilibrium concentrations for the reaction system. This allows for an 
efficient encoding of the problem, and reveals that reaction networks are superbly suited to 
statistical inference tasks. Such a scheme may also provide a template to understanding how in 
vivo biochemical signaling pathways integrate extensive information about their environment 
and history. 


1 Introduction 

The sophisticated behavior of cells emerges from the computations that are being performed by 
the underlying biochemical reaction networks. These biochemical pathways have been studied in 
a “top-down” manner, by looking for recurring motifs, and signs of modularity |18l . There is also 
an opportunity to study these pathways in a “bottom-up” manner by proposing primitive building 
blocks which can be composed to create interesting and technologically valuable behavior. This 
“bottom-up” approach connects with work in the Molecular Computation community whose goal 
is to generate sophisticated behavior using DNA hybridization reactions [2312411 9131 12716I7I22I3I251 
and other Artificial Chemistry approaches m- 

We propose a new building block for molecular computation. We show that the mathematical 
structure of reaction networks is particularly well adapted to compute Maximum Likelihood Estima¬ 
tors for log-linear models, allowing a pithy encoding of such computations by reactions. According 

to H2]: 


Log-linear models are arguably the most popular and important statistical models for 
the analysis of categorical data; see, for example, Bishop, Fienberg and Holland (1975) [4], 
Christensen (1997) [8]. These powerful models, which include as special cases graphical mod¬ 
els [see, e.g., Lauritzen (1996) [ljj] as well as many logit models [see, e.g., Agresti (2002) [T], 
Bishop, Fienberg and Holland (1975) [3]], have applications in many scientific areas, ranging 
from social and biological sciences, to privacy and disclosure limitation problems, medicine, 
data mining, language processing and genetics. Their popularity has greatly increased in the 
last decades... 






In order to respond in a manner that maximizes fitness, a cell has to correctly estimate the 
overall state of its environment. Receptors that sit on cell walls collect a large amount of information 
about the cellular environment. Processing and integration of this spatially and temporally extensive 
and diverse information is carried out in the biochemical reaction pathways. We propose that this 
processing and integration may be advantageously viewed from the lens of machine learning. 

Our proposal entails that schemes for statistical inference by reaction networks are of biologi¬ 
cal significance, and are deserving of as thorough and extensive a study as schemes for statistical 
inference by neural networks. In particular, machine learning is not just a tool for the analysis of 
biochemical data, but theoretical and technological insights from machine learning could provide a 
deep and fundamental way, and perhaps “the” correct way, to think about biochemical networks. 
We view the scheme we present here as a promising first step in this program of applying machine 
learning insights to biochemical networks. 

The problem: We illustrate the main ideas of our scheme with an example. Following EQ, consider 
the log-linear model (also known as toric model) described by the design matrix A = ( 0 ) 2 )- 
This means that we are observing an event with three possible mutually exclusive outcomes, call 
them X 3 ,X 2 , and X 3l which represent respectively the columns of A. The rows of A represent 
“hidden variables” 9 1 and 9 2 respectively which parametrize the statistics of the outcomes in the 
following way specified by the columns of A: 

P[X 1 | 0i,0 2 ] oc0? 

P[X 2 | 0i,0 2 ] ocM 2 

P[X 3 | 0i, 0 2 ] oc0| 

where the constant of proportionality normalizes the probabilities so they sum to 1.0 

Suppose several independent trials are carried out, and the outcome X\ is observed x 3 £ (0,1) 
fraction of the time, the outcome X 2 is observed x 2 € (0,1 — x\) fraction of the time, and the 
outcome X 3 is observed x 3 = 1 — x± — x 2 fraction of the time. We wish to find the maximum 
likelihood estimator (9i,9 2 ) £ R ?, 0 of the parameter ( 6 * 1 , 9 2 ), i.e., that value of 6 which maximizes 
the likelihood of the observed data. 

Our contribution: We describe a scheme that takes the design matrix A to a reaction network 
that solves the maximum likelihood estimation problem. In Definition [HI we describe our scheme for 
every matrix A over the integers with all column sums equal. All our results hold in this generality. 

— In Definition 18151 we show how to obtain from the matrix A , a reaction network that computes 
the maximum likelihood distribution. Specialized to our example, note that the kernel of the 
matrix A is spanned by the vector (1, —2,1) T . We encode this by the reversible reaction 

X x + X 3 ^ 2 X 2 
1 

— In Theorem^ we show that if this reversible reaction is started at initial concentrations Xi(0) = 
x\, X 2 ( 0 ) = x 2 , ^ 3 ( 0 ) = *3, and the dynamics proceeds according to the law of mass action with 
all specific rates set to 1 : 

Xi(t) = X 3 (t) = -Xi{t)X 3 (t) + Xl{t), X 2 (t) = -2 Xl{t) + 2 X 1 (t)X 3 (t) 

1 It is more common in statistics and statistical mechanics literature to write 9\ = e~ El and #2 = e~ E2 in 
terms of “energies” Ei,E 2 so that P[A' 2 | Ei,E 2 ] oc q ~ Ei ~ e2 for example. 
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then the reaction reaches equilibrium (iq, X 2 , £ 3 ) where £\ + £2 + £3 = 1 and £\ oc Of, £2 oc 
0102 , and £3 oc 0 |, so that (aq, £2^3) represents the probability distribution over the outcomes 
Xi, X2, X3 at the maximum likelihood 0 i, 02 - 

— This part of our scheme involves only reversible reactions, and requires no catalysis (see [131 
Theorem 5.2] and Lemma [2]). One difficulty with implementing such schemes has been that 
empirical control over kinetics is rather poor. Exquisitely setting the specific rates of individual 
reactions to desired values is very tricky, and requires a detailed understanding of molecular 
dynamics. Our scheme avoids this problem since any choice of specific rates that leads to the 
same equilibrium will do. Hence we can freely set the specific rates so long as the equilibrium 
constants (ratio of forward and backward specific rates) have value f. This is an equilibrium 
thermodynamic condition that is much easier to ensure in vitro. This combination of reversible 
reactions, no catalysis, and robustness to the values of the specific rates may make this scheme 
particularly easy and efficient to implement. 

— In Definition 18121 we show how to obtain from the matrix A a reaction network that computes 
the maximum likelihood estimator. Specialized to our example, we obtain the reaction network 
with 5 species Xi, X 2 , X 3 , 0i, 62 and the 5 reactions: 

X!+X 3 ^2X2, 20 i —>■ 0 , Xi->X 1 + 29 1 , 

01+02 —^ 0 , X 2 —>■ X 2 + 01 + 02 - 

The number of species equals the number of rows plus the number of columns of A. The reactions 
are not uniquely determined by the problem, but become so once we choose a basis for the kernel 
of A and a maximal linearly independent set of columns. Here we have chosen columns 1 and 2. 
Each column of A determines a pair of irreversible reactions. 

— Theorem [ 6 ] implies that if this reaction system is launched at initial concentrations Xi(0) = 
aq,X 2 (0) = X 2 ,X 3 ( 0 ) = X 3 and arbitrary concentrations of 0i(O) and 02 ( 0 ), and the dynamics 
proceeds according to the law of mass action with all specific rates set to 1 : 

Xi(t) = X 3 (t) = -Xi (t)X 3 (t) + Xf(f), 0i(t) = -20?(i) + 2Xi (t) - 0i(f)0 2 (t) + X 2 (t), 

x 2 (t) = —2Xf (t) + 2X1 (t)x 3 (t% e 2 {t) = -0i e 2 {t) + x 2 (t), 

then the reaction reaches equilibrium (aq, + 2 , + 3 ,0i, 02 ) where (0i, 02 ) is the maximum likelihood 
estimator for the data frequency vector (xi,X2,X3) and (£i,£2,£s) represents the probability 
distribution over the outcomes Xi,X 2 ,X 3 at the maximum likelihood. We prove global conver¬ 
gence: our dynamical system provably converges to the desired equilibrium. Global convergence 
results are known to be notoriously hard to prove in reaction network theory m- 

— A number of schemes have been proposed for translating reaction networks into DNA strand 
displacement reactions I27I22I6I7I . Adapting these schemes to our setting should allow molecular 
implementation of our MLE-solving reaction networks with DNA molecules. 

2 Maximum Likelihood Estimation in toric models 

The definitions and results in this section mostly follow m- Because we require a slightly stronger 
statement, and Theorem [T] allows a short, easy, and insightful proof, we give the proof here for 
completeness. 
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In statistics, a parametric model consists of a family of probability distributions, one for each 
value of the parameters. This can be described as a map from a manifold of parameters into a 
manifold of probability distributions. If this map can be described by monomials as below, then the 
parametric statistical model is called a toric or log-linear model, as we now describe. 

Definition 1 (Toric Model). Let m, n be positive integers. The probability simplex and its relative 
interior are: 

A n := {(x 1 ,x 2 ,-..,x n ) Sl> 0 \x!+x 2 -\ - Vx n = 1} 

ri(Z\ n ) := {(x!,x 2 , ...,x n ) G R> 0 | xi + x 2 H-b x n = 1}. 

An m x n matrix A = (a,j) mX n of integer entries is a design matrix iff all its column sums JA Oij 
are equal. Let aj := ( aij 1 a 2 j,...,a r nj) T be the j’th column of A. Define 9 aj := ...Off 13 . 

Define the parameter space 0 := {6 G R> 0 | 9 ai + 9 a2 + • • • + 9 an = 1}. The toric model of A is 
the map 

p A = [p\,P 2 , ■ ■ ■ ,Pn) : 6 > Z\” given by pj(9) = 9 aj for j = 1 to n. 

We could also have defined the parameter space 0 to be all of K> 0 , in which case we would need 
to normalize the probabilities by the partition function 9 ai + 9 a2 + • • • + 9 an to make sure they add 
up to 1. For our present purposes, the current approach will prove technically more direct. 

Note that here Pj(9) specifies Pr[j | 9], the conditional probability of obtaining outcome j given 
that the true state of the world is described by 9. 

A central problem of statistical inference is the problem of parameter estimation. After per¬ 
forming several independent identical trials, suppose the data vector u G Z" 0 is obtained as a 
record of how many times each outcome occurred. Let the norm |w|i := Ui + u 2 + ■ ■ ■ + u n denote 
the total number of trials performed. The Maximum Likelihood solution to the problem of pa¬ 
rameter estimation finds that value of the parameter 9 which maximizes the likelihood function 
f u {9) := Pr[u | 9\, i.e.: 

9{u) := argsup/ u ( 6 >) (1) 

9gO 

is a maximum likelihood estimator or MLE for the data vector u. We will call the point p[u) := 

Pa(9(u)) a maximum likelihood distribution. 

Definition 2. Let A be an m x n design matrix, and u a data vector. Then the sufficient polytope 
is P A {u) := {p G vi(A n ) \ Ap = A-^}. 

The following theorem is a version of Birch’s theorem from Algebraic Statistics. It provides a 
variational characterization of the maximum likelihood distribution as the unique maximum entropy 
distribution in the sufficient polytope. In particular the maximum likelihood distribution always 
belongs to the sufficient polytope, which justifies the name. 

Theorem 1 . Fix a design matrix A of size m x n. 

1. If u,v G Z” 0 are nonzero data vectors such that Au/\u\\ = Au/|u|i then they have the same 

maximum likelihood estimator: 9{u) = 9(v). 

2. Further if Pa (it) is nonempty then 

(a) There is a unique distribution p G Pa{u) which maximizes Shannon entropy H(p) = — Pi l°g Pi 
viewed as a real-valued function from the closure Pa(u) of Pa{u) with OlogO defined as 0. 
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(b) {p} = Pa{u) n p A {@)■ 

(c) p = p(u), the Maximum Likelihood Distribution for the data vector u. 

Proof. 1. Fix a data vector u. Note that f u {9) = u pu 2 i[ uJ Pi{9) Ul P2(9) U2 .. .p n (9) Un = u pu 2 t.. Un \ 9 Au . 
Therefore the maximum likelihood estimator 

9{u) = argsup 0 Au = arg sup(0 A “) 1/|u|1 = argsup e Au '^ 

8e0 see ee© 

where the second equality is true because the function x K > x c is monotonically increasing whenever 
c > 0. It follows that if v £ Z> 0 is a data vector such that Au/\u\i = Av/\v\i then 9(u) = 9(v). 

2.(a) Suppose Pa(u ) is nonempty. A local maximum of the restriction H\ Pa ^ of H to the polytope 

Pa{u) can not be on the boundary 8 Pa(u) because for p £ 3 Pa{u), moving in the direction of 
arbitrary q £ Pa(u ) increases H , as can be shown by a simple calculation: 

lim 1 — A )p + A q) —> +oo. 

A — >o d\ 

Since H is a continuous function and the closure Pa(u) is a compact set, H must attain its maximum 
value in Pa{u). Further H is a strictly concave function since its Hessian is diagonal with entries 
—1 /pi and hence negative definite. It follows that H | p A ( M ) is also strictly concave, and has a unique 
local maximum at p £ Pa (u) , which is also the global maximum. 

(b) By concavity of H, the maximum p is the unique point in Pa(u) such that VH(p) is perpendicu¬ 
lar to Pa(u). We claim that q £ Pa{u)C\pa(0) iff VH(q) = (—1 — loggi, — 1 — log q 2 ,..., — 1 — log< 7 „) 
is perpendicular to Pa{u). Since all column sums are equal, this is equivalent to requiring that logg 
be in the span of the rows of A, which is true iff q £ pa(0). Hence Pa(u) C\pa{0) = {p}- 

(c) To compute the Maximum Likelihood Distribution p(u), we proceed as follows: 

p{u) = pa{9{u)) = PA(a,rgsup 9 Au ) = p A (arg sup 6> A “ /Hl ) 

0 e© 9g0 

n 

= p^(arg sup 9 a v) = arg sup p? = arg sup /^ y Pi log Pi = p 
8G0 pdpA (0) p£pa(0) 

where the fourth equality uses Ap = Au/\u\i and the last equality follows because X]"=i Pi^°SPi 
viewed as a function of p attains its maximum in all of A n , and hence in pa(@), at p = p. 

This theorem already exposes the core of our idea. We will design reaction systems that maxi¬ 
mize entropy subject to the “correct” constraints capturing the polytope Pa(u). Then because the 
reactions also proceed to maximize entropy, the equilibrium point of our dynamics will correspond 
to the maximum likelihood distribution. Most of the technical work will go in proving convergence 
of trajectories to these equilibrium points. 

3 Reaction Networks 

According to [20l, “In building a design theory for chemistry, chemical reaction networks are usually 
the most natural intermediate representation - the middle of the hourglass m- Many different high 
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level languages and formalisms have been and can likely be compiled to chemical reactions, and 
chemical reactions themselves (as an abstract specification) can be implemented with a variety of 
low level molecular mechanisms.” 

In Subsection 13.11 we recall the definitions and results for reaction networks which we will need 
for our main results. For a comprehensive presentation of these ideas, see m In Subsection 13.21 we 
prove a new result in reaction network theory. We extend a previously known global convergence 
result to the case of perturbations. 

3.1 Brief review of Reaction Network Theory 

For vectors a = (ai)i^s and b = the notation a b will be shorthand for the formal monomial 

IUstf. We introduce some standard definitions. 

Definition 3 (Reaction Network). 

Fix a finite set S of species. 

1. A reaction over S is a pair ( y,y') such that y,y' £ Z> 0 . It is usually written y —» y', with 

reactant y and product y'. 

2. A reaction network consists of a finite set S of species, and a finite set TZ of reactions. 

3. A reaction network is reversible iff for every reaction y —> y' £ TZ, the reaction y' —> y £ 1Z. 

f. A reaction network is weakly reversible iff for every reaction y —>• y' £ 1Z there exists a positive 
integer n £ Z>o and n reactions yi —> yi, 2/2 —>• 2 / 3 , •.., y n -1 —> y n £ TZ with yi = y' and y n = y. 

5. The stoichiometric subspace H C R s is the subspace spanned by {y' — y \ y —>• y' £ TZ}, and 
H x is the orthogonal complement of H. 

6. A siphon is a set T C S of species such that for all y —> y' £ TZ, if there exists i £ T such that 
y\> 0 then there exists j £ T such that yj >0. 

7. A siphon T C S is critical iff v £ H 1 - n with Vi = 0 for all i ^ T implies v = 0. 

Definition 4. Fix a weakly reversible reaction network (S,TZ). The associated ideal I(s,n) Q C[x] 
where x = is the ideal generated by the binomials {x y — x y \ y —t y' £ TZ}. A reaction 

network is prime iff its associated ideal is a prime ideal. 

The following theorem follows from [13] Theorem 4.1, Theorem 5.2], 

Theorem 2. A weakly reversible prime reaction network (S,TZ) has no critical siphons. 

We now recall the mass-action equations which are widely employed for modeling cellular pro¬ 
cesses [29126128(30] in Biology. 

Definition 5 (Mass Action System). A reaction system consists of a reaction network ( S,TZ ) 
and a rate function k : TZ —>• R>o. The mass-action equations for a reaction system are the 
system of ordinary differential equations in concentration variables {xi (t) \ i £ S}: 

x{t) = Yl ky ^ yl x ^ v " y) ( 2 ) 

v->y'e k 

where x(t) represents the vector (xi(t))i^s of concentrations at time t. 

Note that x(t) £ H , so affine translations of H are invariant under the dynamics of Equation [2] 
We recall the well known notions of detailed balanced and complex balanced reaction system. 
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Definition 6 . A reaction system (S,tZ,k) is 

1. Detailed balanced iff it is reversible and there exists a point a G R> 0 such that for every 

y y' £ 72 : ; 

ky^y' a v (y' -y) = k v >^ v a v (y - y') 

A point a £ R> 0 that satisfies the above condition is called a point of detailed balance. 

2. Complex balanced iff there exists a point a £ Rf o such that for every y £ Z 9 0 : 

Y ky^v 1 aV W ~y)= Y k y"^y aV " ( y ~ v "^> 

y^ty'^U y"^>yeTl 

A point a £ R> 0 that satisfies the above condition is called a point of complex balance. 

The following observations are well known and easy to verify. 

— A complex balanced reaction system is always weakly reversible. 

— If all rates = 1 and the network is weakly reversible then the reaction system is complex 

balanced with point of complex balance (1,1,..., 1) £ R 9 ; if the network is reversible then the 
reaction system is also detailed balanced with point of detailed balance (1,1,..., 1) £ R 9 . 

— Every detailed balance point is also a complex balance point, but there are complex balanced 
reversible networks that are not detailed balanced. 

It is straightforward to check that every point of complex balance (respectively, detailed balance) 
is a fixed point for Equation[2] The next theorem, which follows from [2J Theorem 2] and 115| . states 
that a converse also exists: if a reaction system is complex balanced (respectively, detailed balanced) 
then every fixed point is a point of complex balance (detailed balance). Further there is a unique 
fixed point in each affine translation of Ft, and if there are no critical siphons then the basin of 
attraction for this fixed point is as large as possible, namely the intersection of the affine translation 
of H with the nonnegative orthant. 

Theorem 3 (Global Attractor Theorem for Complex Balanced Reaction Systems with 
no critical siphons). Let (5,72, k) be a weakly reversible complex balanced reaction system with 
no critical siphons and point of complex balance a. Fix a point u £ R 9 0 . Then there exists a point 
of complex balance ft in (u + H) n R 9 0 such that for every trajectory x(t) with initial conditions 
a:(0) £ (u + H) D R> 0 , the limit lim^oo x(t) exists and equals ft. Further the function g(x) := 
]G" =1 x i^°S x i ~ x i ~ Xilogai is strictly decreasing along non-stationary trajectories and attains its 
unique minimum value in (u + Ft) n R> 0 at ft. 

It is not completely trivial to show, but nevertheless true, that this theorem holds with weakly 
reversible replaced by “reversible” and “complex balance” replaced by “detailed balance.” What is 
to be shown is that the point of complex balance obtained in (u + Ft) n ®>o by minimizing g(x) 
is actually a point of detailed balance, and this follows from an examination of the form of the 
derivative fjjg(x(t)) along trajectories x(t) to Equation^ 

3.2 A Perturbatively-Stable Global Attractor Theorem 

Global attractor results usually assume that the reaction network is weakly reversible. We are going 
to describe our scheme in the next section. Our scheme will employ reaction networks that are not 
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weakly reversible, yet we will prove global attractor results for them. The key idea we use is that 
our reaction network can be broke into a reversible part, and an irreversible part. The reversible 
part acts on, but evolves independent of, the irreversible part. So we get to use the global attractor 
results “as is” on the reversible part. Further, as the reversible part approaches equilibrium, our 
irreversible part behaves as a perturbation of a reversible detailed-balanced network. The closer the 
reversible part gets to equilibrium, the smaller the perturbation of the irreversible part from the 
dynamics of a certain reversible detailed-balanced network. 

To make this proof idea work out, we will need a perturbative version of Theorem [3] The next 
lemma shows that if the rates are perturbed slightly then, outside a small neighborhood of the 
detailed balance point, the strict Lyapunov function g{x) from Theorem [3] continues to decrease 
along non-stationary trajectories. 

Lemma 1. Let (S, 1Z, k) be a weakly reversible complex balanced reaction system with no critical 
siphons and point of complex balance a. For every sufficiently small e > 0 there exists S > 0 such 
that for all x' outside the e-neighborhood of a in (a + H) n the derivative -^g(x(t))\t=o < —<5, 
where x(t) is a solution to the Mass-Action Equations^ with x(0) = x'. 

Proof. Let B e be the open e ball around a in (a + H) n Rfo, with e small enough so that B e 
does not meet the boundary <9R> 0 . Consider the closed set S := {a + H) (~l R > 0 \ -Sf Define the 
orbital derivative of g at x' as Okg{x') := ■j- t g{x{t))\t=o 1 where x(t) is a solution to the mass-action 
equations [2] with x(0) = x' . Define 6 := inf x / e g(— Okg{x')). If 5 < 0 then since S' is a closed set, 
and Okg is a continuous function, there exists a point x ' such that Okg{x') > 0 , which contradicts 
Theorem [3] 

We formalize the notion of perturbation using differential inclusions. Recall that differential 
inclusions model uncertainty in dynamics in a nondeterministic way by generalizing the notion of 
vector held. A differential inclusion maps every point to a subset of the tangent space at that point. 

Definition 7. Let (S,lZ 7 k) be a reaction system and let 5 > 0. The 5-perturbation of (S, 1Z, k) is 
the differential inclusion V : ®>o 2 R that at point x £ R > 0 takes the value 



,x v (y' - y) 


k' 


y^>y' 


£ (ky^yl — 6 , ky^yf + S) fOT all J/ —» £ 1Z 


A trajectory of V is a tuple (/, x) where I C R is an interval and x : I —> K > 0 is a differentiable 
function with x(t) £ V(x(t)). 

Theorem 4 (Perturbatively-Stable Global Attractor Theorem for Complex Balanced 
Reaction Systems with no critical siphons). Let {S 1 TZ 1 k) be a weakly reversible complex bal¬ 
anced reaction system with no critical siphons. Fix a point u £ R® 0 . Then there exists a point of 
complex balance (3 in (u + Ft) C M > 0 such that: 

1. For every sufficiently small e > 0, there exists S > 0 such that every trajectory of the form 
(M>o,a;) to the 5-perturbation of (S 7 1Z, k) with initial conditions x(0) £ (u + H) nRf 0 eventually 
enters an e-neighborhood of (3 and never leaves. 

2. Consider a sequence <5i > 82 > • • • > 0 and a sequence 0 < t\ < £2 < • • • such that lim^oo Si = 0 
and limj-^oo ti = + 00 , and a trajectory (K>o, x) with x(0) £ (u + H) PI Rf 0 such that ((ti,oo),x) 
is a trajectory of the 6 i-perturbation of (S,lZ,k). Then the limit lim x(t) = f3. 



Proof (Proof sketch). 1. Fix e > 0 such that the e-ball B e around /3 does not meet the boundary 
By Lemma [TJ outside B e , there exists 5 e > 0 such that the function (Dug < —S e . Since Okg is 
a continuous function of the specific rates k , a sufficiently small perturbation S > 0 in the rates will 
not change the sign of Okg. Hence, outside B e , the function g is strictly decreasing along trajectories 
x(t) to Equation [2] It follows that eventually every trajectory must enter B e . 

2. Fix a sequence £i > e 2 > • • • > 0 with e 1 small enough so that the £i-ball around /3 does not 
meet the boundary dKf 0 and Hindoo £, —>■ 0. For each e*, there exists j such that Sj is small enough 
as per part (1) of the theorem. So every trajectory will eventually enter the Ci neighborhood of /3, 
and never leave. Since this is true for every i and linij^^ —>• 0, the result follows. 

4 Main Result 

The next definition makes precise our scheme, which takes a design matrix A to a reaction system 
Smle depending on A. The choice of this reaction system is not unique, but depends on two choices 
of basis. We proceed in two stages. In the first stage, we construct the reaction system Smld which 
solves the problem of finding the maximum likelihood distribution. In the second stage, we add 
reactions to solve for 9 from the algebraic relations between the 9 and X variables, obtaining Smle- 

Definition 8. Fix a design matrix A = (aij) mxn , a basis B for the free group TP 0 kerH, and a 
maximal linearly-independent subset B' of the columns of A. 

1. The reaction network 1 Zmld(A, B ) consists of n species Xi, X 2 , ■ ■ ■, X n and for each b £ B, the 
reversible reaction: 

11 h > X J ^ 11 ~ b i X i 

j:bj >0 j:bj <0 

2. The reaction system Smld(A, B) consists of the reaction network TZmld{A, B) with an assign¬ 
ment of rate 1 to each reaction. 

3. The reaction network TZmle{A, B , B ') consists of m + n species 6\, 6*2 ,..., 9 m , Xi, X 2 , ■ • •, X n , 
and in addition to the reactions in TZmld, the following reactions: 

— For each column j £ B' of A, a reaction a ij@i 0- 

— For each column j € B' of A, a reaction Xj —> Xj + aij9i. 
f. The reaction system Smle(A, B , B') consists of the reaction network 1 Zmle{A 7 B,B') with an 
assignment of rate 1 to each reaction. 

Note that by the rank-nullity theorem of linear algebra, the dimension of the kernel plus the rank 
of the matrix equals the number of columns of the matrix. Hence counting the reversible reactions 
as two irreversible reactions, our scheme yields a reaction system whose number of reactions is twice 
the number of columns of A. 

It is clear from the definition of Smle that the reactions that come from TZmld are reversible and 
evolve without being affected by the other reactions. Hence we first prove global convergence of the 
reaction system Smld to the maximum likelihood distribution. This part is fairly straightforward. 
The key point is to verify that the reaction network TZmld has no critical siphons. In fact, we show 
in the next lemma that TZmld is prime, which will imply “no critical siphons” by Theorem [21 

Lemma 2. Fix a design matrix A = (aij) mxn and a basis B for the free group TP (~l kerH. Then 
the reaction network TZmld {A, B) is prime and Smld(A, B) is detailed balanced. Consequently, the 
reaction system Smld(A, B) is globally asymptotically stable. 
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Proof. 1 Zmld(A, B) is prime by [T7J Corollary 2.15]. The idea is to look at the toric model pa as 
a ring homomorphism C[xi, £ 2 , ■ • ■ ,x n ] —>• C[FL4] with Xj 1 —>• 9 aj . (Here NA is the affine semigroup 
generated by the columns of A.) The kernel of this ring homomorphism is the associated ideal of 
P-mld(A, B) by |T7] Proposition 2.14], and the codomain is an integral domain, so the kernel must 
be prime. 

To verify that Smld(A, B ) is detailed balanced, note that the point (1,1 ,..., 1) £ R” is a point 
of detailed balance since all rates are 1. Global asymptotic stability now follows from Theorem [2] 
and Theorem [3] 

We can now obtain global convergence for Smld- 

Theorem 5 (The reaction system Smld(A, B) computes the Maximum Likelihood Dis¬ 
tribution). Fix a design matrix A = ( aij) mxn , a basis B for the free group Z" n ker A, and a 
nonzero data vector u £ Z> 0 . Let x{t) = (xi(t),X 2 (t),... ,x n (t)) be a solution to the mass-action 
differential equations for the reaction system Smld(A, B) with initial conditions x(0) = w/|u|i. Then 
x(oo) := lim x(t) exists and equals the maximum likelihood distribution p(u). 

t—f OO 

Proof. For the system Smld (A, B), note that (x(0) + H) flR" 0 = Pa{u/\u\{). By Theorem [31 a:(oo) 
exists, and the function 1 Xi l°g x i — Xi — Xi log 1 attains its unique minimum in Pa(u/\u\i) at 
x(oo). Since the system is mass-conserving, x i is constant on Pa{u/\u\\), so this is equivalent 

to the fact that Shannon entropy H{x) = — ^” =1 Xi logXj is increasing, and attains its unique 
maximum value in Pa(u/\u\i) at x(oo). By Theorem [T] the point x(oo) must be the maximum 
likelihood distribution p(u). 

As the reversible reactions in Smle approach closer and closer to equilibrium, we wish to absorb 
the values of the X variables into reaction rates and pretend that the irreversible reactions are 
reactions only in the 9 variables. This has the advantage that we can treat this pretend reaction 
system in the 9 variables as a perturbation of a reversible, detailed balanced system. We can then 
hope to employ Theorem [4] and conclude global convergence for these irreversible reactions, and 
hence for Smle- 

One small technical point deserves mention. The pretend reaction system in the 9 variables is not 
a reaction system since the rates are not real numbers but functions of time. This will not trouble us. 
We have already provisioned for this in Definition [7] by allowing perturbations of reaction systems 
to be differential inclusions. 

Theorem 6 (The reaction system Smle{A , B, B’) computes the Maximum Likelihood Es¬ 
timator). Fix a design matrix A = (aij) mxn , a basis B for the free group Z™nker A, and a nonzero 
data vector u £ Z" 0 . Let x(t) = (aq(i), ^(t),..., x n (t), 9\(t), 02(i), ■ • • > 9 m (t)) be a solution to the 
mass-action differential equations for the reaction system Smle{A, B, B 1 ) with initial conditions 
x(0) = it/|u|i and 9(0) = 0. Then x(oo) := lim^oo x{t) exists and equals the maximum likelihood 
distribution p(u), and 9( 00 ) := lim^oo 9(t ) exists and equals the maximum likelihood estimator 9(u). 

Proof (Proof sketch). Fix u and let p = p[u) and 9 = 9(u). Note that for the species X\, X 2 ,.. ■, X n . 
the differential equations for Smle{A, B) and Smld(A, B, B') are identical, since these species 
appear purely catalytically in the reactions that belong to 1Zmle(A, B, B') \TZmld{A,B). Hence 
x(oo) = p(u) follows from Theorem [5} 

To see that 9(oo) = 9 , let us first allow the X species to reach equilibrium, then treat the 9 
system with replacing the X species by rate constants representing their values at equilibrium. The 
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system Omle(A, B, B', x(oo)) obtained in this way in only the 9 species is a reaction system with 
the reactions 

— For each column j £ B' of A, a reaction a ij^i ~► 0 of rate 1 

— For each column j £ B' of A, a reaction 0 —> Y^iLi a ij^i of rate Xj( oo). 

This is a reversible reaction system, and the maximum likelihood estimators 9 are precisely the points 
of detailed balance for this system, where we are using the fact that B' was a maximal linearly- 
independent set of the columns of A. In addition, this system has no siphons since if species 9i is 
absent, and > 0 then 9i will immediately be produced by the reaction 0 —>• ^™ =1 (We are 

assuming A has no 0 row. If A has a 0 row, we can ignore it anyway.) It follows from Theorem [3] that 
this system is globally asymptotically stable, and every trajectory approaches a maximum likelihood 
estimator 9. 

Our actual system may be viewed as a perturbation of the system 6 >mle(A, B , B\ x(oo)). Con¬ 
sider any trajectory (x(t),9(t)) to Smle(A, B, B') starting at (u/|iz|i,0). We are going to consider 
the projected trajectory (R>,0). We now show that it is possible to choose appropriate t t and Si so 

that ((fj, oo), 0(f)) is a trajectory of a ^-perturbation of 0mle(A, B, B', x(oo)), for i = 1, 2,_ 

Wait for a sufficiently large time fi till x(t) is in a sufficiently small neighborhood of a:(oo) 
which it will never leave. After this time, we obtain a differential inclusion in the 9 species with the 
mass-action equations [2] for the reactions 

— For each column j of A, a reaction l a v@i —t 0 of rate 1 

— For each column j of A , a reaction 0 —> Xa=i a ij@i with time-varying rate lying in the interval 
(xj( oo) — <5i, Xj(oo) + di). 

Continuing in this way, we choose a decreasing sequence <5i > 62 > • ■ • > 0 with lim^oo Si —> 0, 
and corresponding times fi < t 2 < £3... with lim^oo t, — > 00 such that after time tj, x(t) is 
in a Si neighborhood of x(oo) which it will never leave. Then ((tj, 00 ), 9(t)) is a trajectory of the 
(^-perturbation of Omle{A, B, B', x(oo)). Hence 9(t) satisfies the conditions of Theorem H) Hence 
lim^oo 9(t) = 9. 


5 Related Work and Conclusions 

The mathematical similarities of both log-linear statistics and reaction networks to toric geometry 
have been pointed out before EH33- Craciun et al. [9] refer to the steady states of complex-balanced 
reaction networks as Birch points “to highlight the parallels” with algebraic statistics. This paper 
develops on these observations, and serves to flesh out this mathematical parallel into a scheme for 
molecular computation. 

Various building blocks for molecular computation that assume mass-action kinetics have been 
proposed before. We briefly review some of these proposals. 

In [19] . Napp and Adams model molecular computation with mass-action kinetics, as we do here. 
They propose a molecular scheme to implement message passing schemes in probabilistic graphical 
models. The goal of their scheme is to convert a factor graph into a reaction network that encodes the 
single-variable marginals of the joint distribution as steady state concentrations. In comparison, the 
goal of our scheme is to do statistical inference and compute maximum likelihood estimators for log- 
linear models. Napp and Adams focus on the “forward model” task of how a given data-generating 
process (a factor graph) can lead to observed data, whereas our focus is on the “backward model” 
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task of inference, going from the observed data to the data-generating process. Further our scheme 
couples the deep role that MaxEnt algorithms play in Machine Learning with MaxEnt’s roots in 
the Second Law of Thermodynamics whereas Napp and Adams are drawing their inspiration from 
variable elimination implemented via message passing which has its roots in Boolean constraint 
satisfaction problems. 

Qian and Winfree |23l24j have proposed a DNA gate motif that can be composed to build large 
circuits, and have experimentally demonstrated molecular computation of a Boolean circuit with 
around 30 gates. In comparison, our scheme natively employs a continuous-time dynamical system 
to do the computation, without a Boolean abstraction. 

Taking a control theory point of view, Oislii and Klavins [20] have proposed a scheme for imple¬ 
menting linear input/output systems with reaction networks. Note that for a given matrix A, the 
set of maximum likelihood distributions is usually not linear, but log-linear. 

Daniel et al. [10l have demonstrated an in vivo implementation of feedback loops, exploiting 
analogies with electronic circuits. It is possible that the success of their schemes is also related to 
the toric nature of mass-action kinetics. 

Buisman et al. [5] have proposed a reaction network scheme for computation of algebraic func¬ 
tions. The part of our scheme which reads out the maximum likelihood estimator from the maximum 
likelihood distribution bears some similarity to their work. 

One limitation of our present work is that the number of columns of the matrix A can become 
very large, for example 2^' I for a graphical model with V nodes. Since the number of species and 
number of reactions both depend on the number of columns of A, this can require an exponentially 
large reaction network which may become impractical. One direction for future work is to extend our 
scheme by specifying a reaction network that computes maximum likelihood for graphical models. 

We have some freedom in our scheme in the choice of basis sets B and B'. In any chemical 
implementation of this work, there might be opportunity for optimization in choice of basis. 

Acknowledgements: I thank Nick S. Jones, Anne Shiu, Abhishek Behera, Ezra Miller, Thomas 
Ouldridge, Gheorghe Craciun, and Bence Melykuti for useful discussions. 
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